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DESCRIPTION 



Methods for Phenotype Creation 
From Multiple Gene Populations 

Cross Reference to Related Application 

This is a continuation-in-part application of copend- 
ing application Serial No. 513,957, filed April 24, 1990 
which is a continuation-in-part of Serial No. 353,235, 
5 filed May 16, 1989, and Serial No. 353,241, filed May 17, 
1989, the disclosures of which are hereby incorporated by 
reference . 

Field of the Invention 

The present invention relates to methods for randomly 
10 combining populations of nucleotide sequences and select- 
ing those combinations coding for a desired predetermined 
phenotype . 

Background of the Invention 

The production of genetic variants, including vari- 
15 ants of both polypeptides and organisms such as bacteria 
and phage, has been a goal in the work of many individuals 
involved in recombinant DNA technologies. For example, 
researchers have beneficially relied upon random genetic 
recombination in the past for the production of new and 
20 useful microorganisms. Genetic recombination includes a 
variety of processes that produce new linkage relation- 
ships of genes or parts of genes. Genetic recombination 
is often subdivided into general genetic recombination, 
which takes place between homologous chromosomes, more or 
25 less anywhere along their length, and recombination that 
does not require extensive homology. The latter category 
includes site-specific recombination, which depends upon 
the existence of specific sites in one or more molecules 
and which includes interactions of viral genomes and 
30 insertion sequences with chromosomes of prokaryotes and 
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eukaryotes, and less well defined instances of recombina- 
tion that appear to require neither extensive homology nor 
special sites. Variable gene expression can also result 
in production of various combinations of polypeptides, the 
5 immune system being one example of such protein 
combination . 

The immune system of a mammal is one of the most 
versatile biological systems; probably greater than 1.0 x 
10 7 antibody specificities can be produced. Indeed, a 
10 great deal of contemporary biological and medical research 
is directed toward tapping this repertoire. During the 
last decade, furthermore, there has been a dramatic 
increase in the ability to harness the output of the 
immune system. The development of the hybridoma method- 
15 ology by Kohler and Milstein has made it possible to 
produce monoclonal antibodies, i.e. , a composition of 
antibody molecules of single epitope specificity, from the 
repertoire of antibodies induced during an immune 
response. Monoclonal antibodies have been generated in 
20 the past from hybridomas, generated by fusing antibody- 
secreting lymphocytes with an immortal cell line, such as 
myeloma. 

Although standard hybridoma technology has been 
extremely valuable, the screening of fused cells to iden- 
25 tify hybridomas expressing useful antibody molecules is 
labor intensive, time consuming and expensive. Moreover, 
the standard technology yields rodent antibody molecules 
that have two clear disadvantages. The first is that 
subtle variations in certain human antigenic systems, such 
30 as major histocompatibility proteins, are not easily dis- 
tinguished by non-primate antibodies. Therefore, rodent 
antibodies may not provide the repertoire of specificities 
needed to distinguish certain polymorphic antigenic deter- 
minants. In other words, current methods for generating 
35 monoclonal antibodies are not capable of efficiently 
surveying the entire antibody response induced by a 
particular immunogen. Thus, in an individual animal there 
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are at least 5-10,000 different B-cell clones capable of 
generating unique antibodies to a small relatively rigid 
immunogens, such as, for example dinitrophenol . Further, 
because of the process of somatic mutation during the 
5 generation of antibody diversity, essentially an unlimited 
number of unique antibody molecules may be generated. In 
contrast to this vast potential for different antibodies, 
current hybridoma methodologies typically yield only a few 
hundred different monoclonal antibodies per fusion. A 
10 second major drawback in hybridoma technology is that 
rodent antibodies are highly immunogenic in humans, and 
can preclude their continued use in patients for 
diagnostic or therapeutic purposes. 

One alternative is to produce human cells that 
15 express antibody. Unfortunately, it is quite difficult to 
identify and produce pure human monoclonal antibodies. 
Standard methods used to immortalize antibody-producing 
7 cells are less than satisfactory. One approach that 
circumvents the need for human hybridoma cells has been to 
20 use recombinant DNA technology to express fusion antibody 
proteins. These molecules have amino terminal variable 
domains of the light and heavy chains derived from a 
specific rodent monoclonal antibody and the carboxy 
terminal constant region domains derived from a human 
25 antibody. The use of human constant regions diminishes 
the human anti-globulin immune response,- avoiding the 
stimulation of anti-isotypic antibody-producing B cells. 
However, the rodent-derived variable region framework 
domains still elicit a response that is more severe than 
30 a variable domain response directed against a pure human 
antibody. 

In an effort to avoid the anti-idiotypic response 
directed against the rodent framework regions of the 
domains, some researchers have taken a human antibody and 
35 replaced the hypervariable regions (CDRs) with 
hypervariable regions from a rodent antibody specific for 
a selected antigen. Although such antibodies may have an 
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affinity for antigen comparable to the parent rodent 
antibody, the process of grafting all rodent CDRs into a 
human immunoglobulin gene is technically challenging. 

Aside from repertoire specificity and immunogenicity , 

5 other drawbacks in producing monoclonal antibodies with 
the hybridoma methodology include genetic instability and 
low production capacity of hybridoma cultures. One means 
by which the art has attempted to overcome these latter 
two problems has been to clone the immunoglobulin- 
10 producing genes from a particular hybridoma of interest 
into a procaryotic expression system. See, for example, 
Robinson et al., PCT Publication No. WO 89/0099; Winter et 
al., European Patent Publication No. 0239400; Reading, 
U.S. Patent No. 4,714,681; and Cabilly et al., European 
15 Patent Publication No. 0125023. 

The immunologic repertoire of vertebrates has 
recently been found to contain genes coding for 
immunoglobulins having catalytic activity. Tramontane et 
al., Sci. ■ 234:1566-1570 (1986); Pollack et al., Sci. . 

20 234:1570-1573 (1986); Janda et al., Sci. . 244:437-440 
(1989) . The presence of, or the ability to induce the 
repertoire to produce, antibody molecules capable of a 
catalyzing chemical reaction, i.e. , acting like enzymes, 
had previously been postulated almost 20 years ago by W. 
25 P. Jencks in Catalysis in Chemistry and Enzvmoloqy . 
McGraw-Hill, N.Y. (1969). 

It is believed that one reason the art failed to 
isolate catalytic antibodies from the immunological 
repertoire earlier, and its failure to isolate many to 
30 date even after their actual discovery, is the inability 
to screen a large portion of the repertoire for the 
desired activity. Another reason is believed to be the 
bias of currently available screening techniques, such as 
the hybridoma technique, towards the production high 
35 affinity antibodies inherently designed for participation 
in the process of neutralization, as opposed to catalysis. 
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In an attempt to enhance the designed recombination 
of desired DNA sequences or the desired combination of 
otherwise randomly generated polypeptides , including the 
identification and production of pure human monoclonal 
5 antibodies, we have pursued alternative approaches for the 
production and screening of such nucleotide sequences and 
polypeptides . 

Summary of the Invention 

The present invention is directed to methods for pro— 
10 ducing biological agents having a desired novel phenotype 
wherein this phenotype results from expression of a 
particular combined nucleotide sequence and wherein said 
phenotype can be used to identify the biological agents 
having the particular combined nucleotide sequence and 
15 distinguish them from biological agents having other 
combined nucleotide sequences. The desired phenotype is 
typically a phenotype which is not normally expressed by 
the parent nucleotide sequences. In one embodiments these 
methods comprise first replicating at least portions of 
20 two parent nucleotide sequences. The replicating step 
yields a population of diverse replicas of parent nucleo- 
tide sequences. In one embodiment, each parent nucleotide 
sequence initially comprises a population (or family) of 
diverse nucleotide sequences which is replicated to give 
25 a population of diverse replicas. Alternatively, a popu- 
lation of diverse replicas is generated by replicating a 
parent nucleotide sequence under conditions which allow 
mutations to occur which generates diversity from one 
parent nucleotide sequence and results in a population of 
30 diverse replicas. In one aspect, the parent nucleotide 
sequences may comprise a single DNA molecule or alterna- 
tively the parent nucleotide sequences comprise separate 
DNA molecules. Where the parent nucleotide sequences com- 
prise one DNA molecule, after replication, the resulting 
populations of diverse replicas derived from each parent 
nucleotide sequence are separated. The populations of 
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diverse replicas are then brought together, preferably in 
a random manner, to produce combined nucleotide sequences 
wherein each combined nucleotide sequence comprises one 
member of each population of diverse replicas. The parent 
5 nucleotide sequences may be suitably replicated and 
brought together according to the various methods 
described herein for replication and recombination of 
nucleotide sequences and generation of combinatorial 
libraries. The combined nucleotide sequences are 
10 expressed in biological agents. Such biological agents 
may comprise a host cell, or alternatively, a plasmid, 
bacteriophage or virus, or nucleic acid vector, and such 
suitable means for expression are described herein. In 
one embodiment< expression may constitute the mere exist - 
15 ence of th enucleotide sequences in the same biological 
agent. Then, the biological agents which express the 
desired phenotype are identified. If desired, the pheno- 
type is used to distinguish those biological agents 
expressing the particular combined nucleotide sequence 
20 from biological agents expressing other combined nucleo- 
tide sequences. The desired phenotype may comprise a 
polypeptide, more than one polypeptide, or a multimeric 
polypeptide, the expression of which is detectable. 
Alternatively, the phenotype may comprise synthesis of one 
25 or more RNA molecules. Optionally, either the polypep- 
tides or RNA may exhibit enzymatic activity or receptor 
activity? or the DNA or RNA may simply act as a target for 
interaction with other molecules. 

The present invention provides novel methods for the 
30 cloning of cells having novel phenotypes. These methods 
generally include the use of a combinatorial library 
selection system to generate a diverse collection of 
clones. In one aspect, the methods utilize at least two 
starting populations of nucleotide sequences which can be 
recombined to form a library of clones containing nucleo- 
tide sequences from each of the parent populations. These 
methods can be utilized, therefore, to create cells having 
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novel phenotypes, that is, cells having a new and desired 
combination of expressed polypeptides. These methods can 
also be used for the production of new combinations of 
polypeptides, including the polypeptides utilized in the 
5 formation of biologically competent immunoglobulin mole- 
cules. In accordance with the latter object of the 
invention, these methods can be used to screen a larger 
portion of the immunological repertoire for receptors 
having a preselected activity than has heretofore been 
10 possible, thereby overcoming the before-mentioned 
inadequacies of the hybridoma technique. 

In another embodiment , the present invention 
contemplates a gene library comprising an isolated admix- 
ture of at least about 10 3 , preferably at least about 10 4 
15 and more preferably at least 10 s V H -and/or V L -coding DNA 
homologs, a plurality of which share a conserved antigenic 
determinant. Preferably, the homologs are present in a 
medium suitable for in vitro_ manipulation, such as water, 
phosphate buffered saline and the like, which maintains 
20 the biological activity of the homologs. 

In one embodiment, at least two starting populations 
of DNA sequence-containing vectors are physically combined 
by any of several techniques, including those described 
herein, to form a library of clones containing DNA 
25 sequences from each of the parent populations. Alterna- 
tively there may be more than two gene families and the 
vectors produced thereby may contain a random assortment 
of one member of each gene family to create the identifi- 
able characteristic. These vectors can then be 

30 transferred to desired host cells to create in vivo novel 
combinations of phenotypic characteristics in the host 
cell. Methods of combining desired DNA sequences include 
the use of restriction digestion and ligation, homologous 
recombination, and site-specific recombination by methods 
35 including intergrase-related proteins, flp recombinase- 
catalyzed recombination, the cre -lox system of bacterio- 
phage PI, and the use of transposons. 

SUBSTITUTE SHEET 




WO 91/16427 



PCT/US91/02910 



8 

In a still further embodiment, the present invention 
contemplates vectors for use in the methods which 
comprise, in addition to random DNA sequences from the 
starting gene family populations, DNA sequences which 
5 facilitate the region-specific, random recombination 
together of at least one gene from each starting gene 
family population. Sequences enabling the recombination 
of these vectors include the use of functional flp recom- 
bination sequences, functional loxp recombination 
10 sequences, at sequences recognized by integrase-related 
proteins from lambdoid bacteriophages, and terminal repeat 
sequences recognized by transposases. Thus, the present 
invention also includes methods for the combinatorial 
generation of phenotypes, including a method of producing 
15 a nucleic acid vector encoding two or more desired genes 
each from a family of genes, said genes being capable of 
producing a characteristic that can be used to identify 
the vector encoding said genes from other vectors encoding 
other members of the families of genes, which method 
20 comprises: 

a) randomly inserting into vectors one member of a 
first family of genes and one member from one or more 
other families of genes so that a population of vectors 
are created wherein each vector may contain one of the 

25 genes from said first gene family and one of the genes 
from each of said other gene families; 

b) identifying within said population of vectors a 
vector capable of detectably producing a desired charac- 
teristic resulting from the inclusion of one gene from 

30 said first gene family and one gene from each of said 
other gene families, and using said characteristic to 
distinguish the vector from other vectors within the 
population containing undesired combinations of gene 
members from said gene families. 

Suitable vectors for use according to the methods of 
the present invention include plasmid or cosmid vectors 
or, alternatively, phage vectors. Suitable host cells for 

SUBSTITUTE SHEET 



35 




< I IVZC79U6 OAA> ; QlOOOSN3 



WO 91/16427 



PCT/US9 1/02910 



9 

expressing the vectors comprise either eukaryotic cells or 
prokaryotic cells. Preferred eukaryotic cells include 
mammalian cells. In one preferred aspect, the vectors 
comprise lambda bacteriophage and host cells comprise E. 

5 Coli. 

Preferably, the genes are combined in vivo. 

Various suitable methods may be used for the identi- 
fication of a particular vector within the recombinant 
vector population. These methods include (a) the inter- 
10 action of sequence-specific nucleic acids with genes from 
the individual families which were combined: (b) the 

hybridization of nucleic acid probes with genes from the 
gene families? (c) the expression of one or both genes 
from the gene families as an RNA molecule; and (d) the 
15 expression of one or both genes as an identifiable protein 
molecule. optionally, such an identifiable protein 
molecule may contain a binding site for another molecule, 
an epitope recognized by an antibody, or an immune 
molecule binding site for an epitope. In a preferred 
20 identification method, both genes express an RNA and/or 
polypeptide and said RNAs and/or polypeptides physically 
interact with a host to create an identifiable character- 
istic. Both genes may express polypeptides that physic- 
ally interact to form a neo-epitope recognized by an 
25 immune molecule or polypeptides that physically interact 
to form a binding site for another molecule. Optionally 
those polypeptides are derived from antibody genes such 
that the interaction of both polypeptides forms an antigen 
binding site. 

30 In another preferred aspect, the vectors produced 

according to the present invention contain a single 
promoter that expresses the genes from the gene families. 
Alternatively, the genes from the gene families are each 
expressed from their own promoter. 

In a still further embodiment, the present invention 
contemplates the creation of combinations of two or more 
nucleotide sequence families (or populations) by in vitro 
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recombination. Such in vitro recombination could be 
carried out using specific recombination target sequences 
and specific recombinases (like flp recombinase) , or by 
using homologous sequences shared by both nucleotide 
5 sequence populations to facilitate homologous 
recombination . 

One method to accomplish a form of homologous 
recombination in vitro is by using in vitro nucleic acid 
amplification methods such as the polymerase chain reac- 
10 tion (PCR) . If both of two populations of DNA sequences 
share a region of homology, then it is possible during the 
PCR for base-pairing to occur between single stranded 
nucleic acid molecules from both populations of nucleotide 
sequences. If such base pairing creates a "primer- 
15 template complex” that can be used by a polymerase to 
begin synthesis of complementary strands, then a fusion 
product is created which will contain sequences from both 
nucleotide sequence populations (See Figure 21 here) . If 
the shared region of homology is present on most or all of 
20 the two nucleotide sequence populations, then most or all 
of the nucleotide sequences can participate in such 
recombination. Thus, a combinatorial population of fusion 
nucleotide sequences can be produced, and subsequently 
inserted into a single expression vector for expression of 
25 the nucleotide sequence from both sequence families. Such 
a combinatorial population of expressed sequences can then 
be screened for new phenotypes that would not be present 
if the sequences from only one population of nucleotide 
sequences were expressed, and would be present only with 
30 expression of particular combinations comprising a nucleo- 
tide sequence from each population. For example, such 
phenotypes could comprise the creation of heterodimeric 
proteins where one subunit of the dimer is encoded by one 
nucleotide sequence family and the other subunit of the 
35 dimer is encoded by the other nucleotide sequence family. 

Thus, the present invention is directed to methods of 
creating diversity, namely populations of diverse replicas 
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of nucleotide sequences which may be combined to give a 
diversity of phenotypes, from which a desired phenotype 
may be selected. Such diversity may be generated starting 
with a single DNA molecule which is treated to create 
5 diversity, such as by mutagenesis or by starting with a 
family of nucleotide sequences (or genes) or a 
combinatorial library . 

For example, one may start with a plasmid containing 
antibody sequences coding for both a light chain and a 
10 heavy chain which has been isolated from a known 
monoclonal-antibody producing cell line. The nucleotide 
sequences coding for the light chain and the heavy chain 
may be individually amplified (using a method such as PCR) 
under conditions that mutated sequences are generated to 
15 create a population of mutated sequences. The individual 
populations of mutated sequences may be used to make com- 
binatorial libraries which are then used to create novel 
phenotypes. Alternatively, these individual populations 
of mutated sequences may be combined using techniques such 
20 as fusion polynucleotide amplification (for example) 
fusion PCR (as described herein) and used to generate 
novel phenotypes. These novel phenotypes may include 
antibodies having enhanced antigen binding 
characteristics . 

25 According to another aspect of the present invention, 

one or more genetically distinct phage may be lytically 
replicated, conditions which are somewhat mutagenic, to 
generate a population (s) of diverse phage. Phage having 
phenotypes distinct from the originals may be generated by 
30 cleavage such as by a restriction endonuclease, followed 
by mixing of phage populations, and ligation, followed by 
selection for expression of desired phenotypes. In this 
way phage having diverse phenotypes distinct from the 
parental phage may be generated combinatorially. 

In another embodiment, the methods are utilized to 
produce novel human antibody-expressing DNA sequences. 
First, an immunoglobulin heavy chain variable region V H 
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gene library containing a substantial portion of the V H 
gene repertoire of a vertebrate is synthesized. In pre- 
ferred embodiments, the V H -coding gene library contains at 
least about 10 3 and more preferably at least about 10 4 and 
5 more preferably at least about 10 5 different V H -coding 
nucleic acid strands referred to herein as V H — coding DNA 
homologs. 

The gene library can be synthesized by various 
methods, depending on the starting material. Where the 
10 starting material is a plurality of V H -coding genes, the 
repertoire is subjected to two distinct primer extension 
reactions. The first primer extension reaction uses a 
first polynucleotide synthesis primer capable of initiat- 
ing the first reaction by hybridizing to a nucleotide 
15 sequence conserved (shared by a plurality of genes) within 
the repertoire. The first primer extension reaction 
produces a plurality of different v H -coding homolog 
complements (nucleic acid strands complementary to the 
genes in the repertoire) . The second primer extension 
20 reaction produces, using the complements as templates, a 
plurality of different V H -coding DNA homologs. The second 
primer extension reaction uses a second polynucleotide 
synthesis primer that is capable to initiating the second 
reaction by hybridizing to a nucleotide sequence conserved 
25 among a plurality of V„-coding gene complements. 

Where the starting material is a plurality of 
complements of different V H -coding genes provided by a 
method other than the first primer extension reaction, the 
repertoire is subjected to the above-discussed second 
30 primer extension reaction. That is, where the starting 
material is a plurality of different V H -coding gene 
complements produced by a method such as denaturation of 
double strand genomic DNA, chemical synthesis and the 
like, the complements are subjected to a primer extension 
35 reaction using a polynucleotide synthesis primer that 
hybridizes to a plurality of the different V H -coding gene 
complements provided. Of course, if both a repertoire of 
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V-coding genes and their complements are present in the 
starting material , both approaches can be used in 
combination. 

A V H -coding DNA homolog, i.e. , a gene coding for a 
5 receptor capable of binding the preselected ligand, is 
then segregated from the library to produce the isolated 
gene. This may be accomplished by operatively linking for 
expression a plurality of the different V H — coding DNA 
homologs of the library to an expression vector. The V H - 
10 expression vectors so produced are introduced into a popu- 
lation of compatible host cells, i.e., cells capable to 
expressing a gene operatively linked for expression to the 
vector. The transformants are cultured under conditions 
for expressing the receptor coded for by the V H -coding DNA 
15 homolog. The transformants are cloned and the clones are 
screened for expression of a receptor that binds the pre- 
selected ligand. Any of the suitable methods well known 
in the art for detecting the binding of a ligand to a 
receptor can be used. A transformant expressing the 
20 desired activity is then segregated from the population to 
produce the isolated gene. 

A receptor having a preselected activity produced by 
a method of the present invention, preferably a V H or F v as 
described herein, is also contemplated. 

25 The present invention also encompasses products 

produced by the methods of the invention, such as the 
biological agents produced thereby, also the expression 
products of these methods such as polypeptides and nucleic 
acids, vectors produced and kits comprising any of the 
30 products of the claimed methods. 

Brief Description of the Drawings 

In the drawings forming a portion of this disclosure: 
Figure 1 illustrates a schematic diagram of the 
immunoglobulin molecule showing the principal structural 
features. The circled area on the heavy chain represents 
the variable region (V H ) , a polypeptide containing a 
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biologically active (ligand binding) portion of that 
region, and a gene coding for that polypeptide, are 
produced by the methods of the present invention. 
Sequences L03, L35, L47 and L48 could not be classified 
5 into any predefined subgroups. 

Figure 2A is a diagrammatic sketch of an H chain of 
human IgG (IgGl subclass). Numbering is from the N- 
terminus on the left to the C— terminus on the right. Note 
the presence of four domains, each containing an intra- 
10 chain disulfide bond (S-S) spanning approximately 60 amino 
acid residues. The symbol CHO stands for carbohydrate. 
The V region of the heavy (H) chain (V H ) resembles V L in 
having three hypervariable CDR (not shown) . 

Figure 2B is a diagrammatic sketch of a human K chain 
15 (Panel 1) . Numbering is from the N-terminus on the left 
to the C-terminus on the right. Note the intrachain 
disulfide bond (S-S) spanning about the same number of 
amino acid residues in the V L and C L domains. Panel 2 
shows the locations of the complementarily-determining 
20 regions (CDR) in the V L domain. Segments outside the CDR 
are the framework segments (FR) . 

Figure 3 depicts the amino acid sequence of the V H 
regions of 19 mouse monoclonal antibodies with specificity 
for phosphory lchol ine . The designation HP indicates that 
25 the protein is the product of a hybridoma. The remainder 
are myeloma proteins. (From Gearhart et al. , Nature , 
291:29, 1981.) 

Figure 4 illustrates the results obtained from PCR 
amplification of mRNA obtained from the spleen of a mouse 
30 immunized with FITC. Lanes R17-R24 correspond to ampli- 
fication reactions with the unique 5' primers (2-9, Table 
1) and the 3' primer (12, Table 1), R16 represents the PCR 
reaction with the 5' primer containing inosine (10, Table 
1) and 3' primer (12, Table 1). Z and R9 are the ampli- 
35 fication controls; control Z involves the amplification of 
V H for a plasmid (PLR2) and R9 represents the amplification 
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from the constant regions of spleen mRNA using primers 11 
and 13 (Table 1) . 

wmr. 5 depicts nucleotide sequences of clones from 
the cDNA library of the PC* amplified V. regions in lambda 
ZAP vector. The N-terminal 110 bases are listed here and 
the underlined nucleotides represent CDR1 (complementary 
determining region) • 

sa and SB depict the sequence of the 
synthetic DNA insert inserted into Lambda ZAP vector to 
produce Lambda Zap II V„ <6A) and Lambda Zap V (SB) 
expression vectors. The various features required for 
this vector to express the V, and ^-coding DNA homologs 
include the shine-Dalgarno ribosome binding site, a leader 
sequence to direct the expressed protein to the Periplasm 
as described by Mouva et al., mol. chem. , • ■ 

1980, and various restriction enzyme sites used to opera- 
tively link the V, and V t homologs to the expression 
vector. The V, expression-vector sequence also contains a 
Short nucleic acid sequence that codes for amino acids 
typically found in variable regions heavy chain (V 
Backbone, . This V, Backbone is gust upstream and in the 
proper reading as the V, DNA homologs that are operati 
linked into the xho I and Spe I. The V t DNA homologs are 
operatively linked into the V, sequence (6B) at the Nco 
and spe I restriction enzyme sites and thus the V, Backbone 
region is deleted when the V, DNA homologs are operatively 
linked into the \ vector. 

rtmme 7 depicts the major features of the bacterial 
expression vector Lambda Zap II v„ (V.-expression vector) 
are shown. The synthetic DNA sequence from Figure 6 is 
shown at the top along with the T 3 polymerase promoter from 
Lambda Zap II vector. The orientation of the insert in. 
Lambda Zap II vector is shown. The V, DNA homologs are 
inserted into the Xho I and spe I restriction enzyme 
sites. The V„ DNA are inserted into the Xho I and Spe I 
site and the read through transcription produces the 
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decapeptide epitope (tag) that is located just 3' of the 
cloning sites. 

Figure 8 depicts the major features of the bacterial 
expression vector Lambda Zap II V t (V L expression vector) 

5 are shown. The synthetic sequence shown in Figure 6B is 
shown at the top along with the T 3 polymerase promoter from 
Lambda Zap II vector. The orientation of the insert in 
Lambda Zap vector II is shown. The V L DNA homologs are 
inserted into the phagemid that is produced by the in vivo 
10 excision protocol described by Short et al . , Nucleic Acids 
Res. . 16:7583-7600, 1988. The V L DNA homologs are inserted 
into the Nco I and Spe I cloning sites of the Phagemid. 

Figure 9 depicts a modified bacterial expression 
vector Lambda Zap II V L II. This vector is constructed by 
15 inserting this synthetic DNA sequence, 

TGAATTCTAAACTAGTCGCCAAGGAGACAGTCATAATGAA 

TCGAACTTAAGATTTGATCAGCGGTTCCTCTGTCAGTATTACTT 

ATACCTATTGCCTACGGCAGCCGCTGGATTGTTATTACTCGCTG 

TATGGATAACGGATGCCGTCGGCGACCTAACAATAATGAGCGAC 
2 0 CCCAACCAGCCATGGCCGAGCTCGTCAGTTCTAGAGTTAAGCGGCCG 

GGGTTGGTCGGTACCGGCTCGAGCAGTCAAGATCTCAATTCGCCGGCAGCT 
into Lambda Zap II vector that has been digested with the 
restriction enzymes Sac I and Xho I. This sequence 
contains the Shine— Dalgarno sequence (ribosome binding 
25 site) , the leader sequence to direct the expressed protein 
to the periplasm and the appropriate nucleic acid sequence 
to allow the V L DNA homologs to be operatively linked into 
the SacI and Xbal restriction enzyme sites provided by 
this vector. 

30 Figure 10 depicts the sequence of the synthetic DNA 

segment inserted into Lambda Zap II vector to produce the 
lambda V L II— expression vector. The various features and 
restriction endonuclease recognition sites are shown. 

Figure 11 depicts the vectors for expressing V H and V L 
35 separately and in combination. The various essential 
components of these vectors are shown. The light chain 
vector or V L expression vector can be combin d with the V H 
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expression vector to produce a combinatorial vector con- 
taining both V H and V L operatively linked for expression 
to the same promoter. 

Figure 12 depicts the labelled proteins immuno- 
5 precipitated from E. coli containing a V H and a V L DNA 
homolog are shown. In lane 1, the background proteins 
immunoprecipitated from E. coli that do not contain a V H or 
V L DNA homolog are shown. Lane 2 contains the v H protein 
immunoprecipitated from E. coli containing only a V H DNA 
10 homolog. In lanes 3 and 4, the commigration of a V H 
protein a V L protein immunoprecipitated from IL — coli 
containing both a V H and a V L DNA homolog is shown. In 
lane 5 the presence of V H protein and V L protein expressed 
from the V H and V L DNA homologs is demonstrated by the two 
15 distinguishable protein species. Lane 5 contains the 
background proteins immunoprecipitated by anti -EL* — coli 
antibodies present in mouse ascites fluid. 

Figure 13 depicts the transition state analogue 
(formula 1) which induces antibodies for hydrolyzing 
20 carboxamide substrate (formula 2). The compound of 
formula 1 containing a glutaryl spacer and a N- 
hydroxysuccinimide-1 inker appendage is the form used to 
couple the hapten (formula 1) to protein carriers KLH and 
BSA, while the compound of formula 3 is the inhibitor. 
25 The phosphonamidate functionality is a mimic of the 
stereoelectronic features of the transition state for 
hydrolysis of the amide bond. 

Figure 14 illustrates the PCR amplification of Fd and 
kappa regions from the spleen raRNA of a mouse immunized 
30 with NPN. Amplification was performed as described in 
Example 17 using RNA cDNA hybrids obtained by the reverse 
transcription of the mRNA with primer specific for ampli 
fication of light chain sequences (Table 2) or heavy chain 
sequences (Table 1) . Lanes F1-F8 represent the product of 
35 heavy chain amplification reactions with one of each of 
the eight 5' primers (primers 2-9, Table 1) and the unique 
3 * primer (primer 15, Table 2). Light chain (k) amplifi - 
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cations with the 5' primers (primers 3—6, and 12, respect- 
ively, Table 2) are shown in lanes F9-F13. A band of 700 
bps is seen in all lanes indicating the successful 
amplification of Fd and k regions. 

5 Figure 15 depicts the screening of phage libraries 

for antigen binding is depicted according to Example 17C. 
Duplicate plague lifts of Fab (filters A, B) , heavy chain 
(filters E, F) and light chain (filters G,H) expression 
libraries were screened against 125 I-labelled BSA conjugated 
10 with NPN at a density of approximately 30,000 plaques per 
plate. Filters C and D illustrate the duplicate secondary 
screening of a cored positive from a primary filter A 
(arrows) as discussed in the text. 

Screening employed standard plaque lift methods. XL1 
15 Blue cells infected with phage were incubated on 150mm 
plates for 4 hours at 37 °C, protein expression induced by 
overlay with nitrocellulose filters soaked in lOmM isopro- 
pyl thiogalactoside (IPTG) and the plates incubated at 25° 
for 8 hours. Duplicate filters were obtained during a 
20 second incubation employing the same conditions. Filters 
were then blocked in a solution of 1% BSA in PBS for 1 
hour before incubation with rocking at 25° for 1 hour with 
a solution of 125 I-labelled BSA conjugated to NPN (2 x 10 6 
cpm ml* 1 ? BSA concentration at 01 M; approximately 15 NPN 
25 per BSA molecule) in 1% BSA/PBS. Background was reduced 
by pre-centrifugation of stock radiolabelled BSA solution 
at 100,000 g for 15 minutes and pre- incubation of solu- 
tions with plaque lifts from plates containing bacteria 
infected with a phage having no insert. After labeling, 
30 filters were washed repeatedly with PBS/0.05% Tween 20 
before development of autoradiographs overnight. 

Figure 16 depicts the specificity of antigen binding 
as shown by competitive inhibition is illustrated accord- 
ing to Example 17C. Filter lifts from positive plaques 
35 were exposed to 125 I-BSA-NPN in the presence of increasing 
concentrations of the inhibitor NPN. 
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In this study a number of phages correlated with NPN 
binding as in Figure 15 were spotted (about 100 particles 
per spot) directly onto a bacterial lawns. The plate was 
then overlaid with an IPTG-soaked filter and incubated for 
5 19 hours at 25”. The filter were then blocked in 1% BSA 

in PBS prior to incubation in 125 I-BSA-NPN as described 
previously in Figure 15 except with the inclusion of vary- 
ing amounts of NPN in the labeling solution. Other 
conditions and procedures were as in Figure 15. The 
10 results for a phage of moderate affinity are shown in 
duplicate in the figure. Similar results were obtained 
for four other phages with some differences in the 
effective inhibitor concentration ranges. 

Figure 17 depicts the characterization of an antigen 
15 binding protein is illustrated according to Example 17D. 
The concentrated partially purified bacterial supemate of 
an NPN-binding clone was separated by gel filtration and 
aliquots from each fraction applied to microtitre plates 
coated with BSA— NPN . Addition of either anti-decapeptide 
20 ( ) or anti-kappa chain antibodies conjugated with alka- 

line phosphatase was followed by color development. The 
arrow indicates the position of elution of a known Fab 
fragment. The results show that antigen binding is a 
property of 50 kD protein containing both heavy and light 
25 chains. 

Single plaques of two-NPN-positive clones (Figure 15) 
were picked and the plasmid containing the heavy and light 
chain inserts excised. 500 ml cultures in L-broth were 
inoculated with 3 ml of a saturated culture containing the 
30 excised plasmids and incubated for 4 hours at 37 *G. 
Proteins synthesis was induced by the addition of IPTG to 
a final concentration of ImM and the cultures incubated 
for 10 hours at 25”C. 200 ml of cells supernate were 

concentrated to 2 ml and applied to a TSK-G4000 column. 
35 50 4 I aliquots from the eluted fractions were assayed by 

ELISA. 
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For ELISA analysis, microtitre plates were coated 
with BSA-NPN at 1 ug/ml, 50 nl samples mixed with 50 /il 
PBS -Tween 20 (0.05%) -BSA (0.1%) added and the plates 

incubated for 2 hours at 25°. After washing with PBS- 
5 Tween 20-BSA, 50 /il of appropriate concentrations of a 

rabbit anti-decapeptide antibody (20) and a goat anti- 
mouse kappa light chain (Southern Biotech) antibody 
conjugated with alkaline phosphatase were added and 
incubated for 2 hours at 25*. After further washing, 50 
10 /il of p-nitrophenyl phosphate (lmg/ral in 0.1M Tris pH 9.5 
containing 50 mM MgCl 2 ) were added and the plates incubated 
for 15-30 minutes before reading the OD at 405nm. 

Figure 18A depicts the major features of the 
bacterial expression vector HCFLP containing a V„ DNA 
15 homolog and a flp recombination site. 

Figure 18B depicts the major features of the 
bacterial expression vector LCFLP containing a V t DNA 
homolog and a flp recombination site properly oriented for 
recombination with the HCFLP vector. 

20 Figure 19 depicts a diagrammatic sketch of bacterial 

coinfection with HCFLP and LCFLP vectors for the produc- 
tion of recombinant expression vectors containing V t and V H 
DNA homologs. 

Figure 20 depicts an outline showing arm selection 
25 for heavy and light chain recombinant vector products 
using flp recombinase in conjunction with selection based 
on the inclusion of genes having amber mutations. 

Figure 21 shows an outline of a method of phenotype 
creation using the fusion PCR process described herein. 

30 Figure 22 illustrates human fusion PCR inside 

primers. The heavy chain C H l' inside primer sequence is 
written 3* to 5' and the light chain V L inside primer 
sequence is written 5 ' to 3 ' . Note that it is not the 
primer strands that cross-prime to create the fusion 
35 molecule, but the complementary PCR product strands. 
Boxed nucleotides represent regions where the C^l' primer 
hybridizes to the 3' end of C H 1 on human IgG heavy chain 
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mRNA or where the V L primer hybridizes to the 5 f end of V L 
framework- 1 on human kappa light chain cDNA. Underlined 
sequences indicate the two stop condons . The italicized 
amino acid and nucleotides indicate changes in sequence 
5 from the original pelB leader sequence. The mouse fusion- 
PCR internal primers overlap in a similar manner. 

Figure 23 illustrates an ethidium bromide stained 
agarose gel. After PCR amplification from human cloned 
DNA of heavy chain alone (HC) , light chain alone (LC) , and 
10 the heavy/light dicistronic DNA molecule (H/L) , DNA sam- 
ples were electrophoresed. The expected sizes of the HC, 
LC, and H/L products visualized on the gel were approxi- 
mately 730, 690, and 1,390 base pairs, respectively. 

Figures 24 A and 24B illustrate the major features of 
15 the bacterial expression vector Lambda ZAP II Modified V H 
(Modified ImmunoZAP H) (V H -expression vector) (IZ H) . The 
amino acids encoded by the synthetic DNA sequence from 
Figure 24A is shown along with the T 3 polymerase promoter 
from Lambda ZAP II. The orientation of the insert in 
20 Lambda ZAP II is as presented. The insert was modified by 
the elimination of the Sac I site between the T 3 polymerase 
and Not I site and by the change of amino acids at the 5* 
end of the heavy chain from QVKL to QVQL (alysine residue 
was changed to a glutamine residue) . The V H and V L DNA 
25 homologs were inserted into the Xho I and Xba I cloning 
sites of the phagemid as described in Figure 26 and shown 
in Figure 24B. The modifications were made to create a 
fusion-PCR library from hybridoma RNA, to overcome 
decreased efficiency of secretion of positively charged 
3 0 amino acids in the amino terminus of the protein. Inouye 
et al., Proc. Natl. Acad. Sci., USA , 85:7685-7689 (1988), 
and to make the V L Sac I cloning site a unique restriction 
site. 

Figures 25A and 25B illustrate the sequences of the 
35 synthetic DNAs inserted into Lambda ZAP to produce Lambda 
Zap II V H (ImmunoZAP H) (25A) and Lambda Zap V L (ImmunoZAP 
L) (25B) expression vectors. The various features 

SUBSTITUTE SHEET 



BNSDOCIO. <WO 911 6427A 1_l_> 




WO 91/16427 



PCT/US9 1/029 10 



BNSDOCID. <WO. 



22 

required for these vectors to express the V H and V L — coding 
DNA homologs include the Shine-Dalgarno ribosome binding 
site, a leader sequence to direct the expressed protein to 
the periplasm as described by Mouva et al., J . Biol . 

5 Chem. . 255:27, 1980, and various restriction enzyme sites 
used to operatively link the V H and V L homologs to the 
expression vector. The V H expression-vector sequence also 
contains a short nucleic acid sequence that codes for 
amino acids typically found in variable regions of the 
10 heavy chain (V H Backbone). This V H Backbone is just 
upstream and in the proper reading frame as the V H DNA 
homologs that are operatively linked into the Xho I and 
Spe I restriction sites. The V L DNA homologs are opera- 
tively linked into the V L sequence (25B) at the Sac I and 
15 Xba I restriction enzyme sites. 

Figure 26 illustrates the major features of the 
bacterial expression vector Lambda Zap II V H (ImmunoZAP H) 
(V H — expression vector) . The amino acids encoded by the 
synthetic DNA sequence from Figure 25A is shown at the top 
2 0 along with the T 3 polymerase promoter from Lambda Zap II. 
The orientation of the insert in L a mb da Zap II is as pre- 
sented. The V H DNA homologs were inserted into the 
phagemid that is produced by the in vivo excision protocol 
described by Short et al., Nucleic Acids Res. . 16:7583- 
25 7600, 1988. The V H DNA homologs were inserted into the 

Xho I and Spe I restriction enzyme sites. The read 
through transcription produces the decapeptide epitope 
(tag) that is located just 3' of the cloning sites. 

Figure 27 illustrates the major features of the 
30 bacterial expression vector Lambda Zap II V t (ImmunoZAP L) 
(V L expression vector) . The amino acids encoded by the 
synthetic DNA sequence shown in Figure 25B is shown at the 
top along with the T 3 polymerase promoter from Lambda Zap 
II. The orientation of the insert in Lambda Zap II is as 
35 presented. The V L DNA homologs are inserted into the Sac 
I and Xba I cloning sites of the phagemid as described in 
Figure 26. 



.911 6427Al_l_> 



SUBSTITUTE SHEET 




WO 91/16427 



PGT/US91/02910 



23 



Figure 28 illustrated an autoradiogram showing 
sianals obtained from human phage clones. Approxima e 
too lambda phage were spotted onto 

^ZTIZI °IT1 I^r^-O-iogalacto- 

S ZZ to induce Fab expression Following 
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overnight inoubatron, the filters 
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tetanus toxoid probe. After washing, the filters were 

exposed to X-ray film. The column on the right represen s 
e ^ ec ted form a combma- 

10 the parental clones that w ^ toad, set.. 

torial library. Mullinax et al., _ repre- 
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with tetanus toxoid. Clones 10C1 and modifie d 

Fabs that react with tetanus toxoid. _ 

20 heavy chain ImmunoZAP H vector without an inser . 

nstailed D e scription of the Inventio n 

A. Defl nit ions . 

AS used herein, the following terms have 

tallowing menings unless expressly stated to ^ “ntrary 
-w-l-tlde : a monomerio unit of DNA or RNA consist 

ing of a sugar moiety (pentose), a phosphate, and a nitr - 
genous heterocyclic base. The base is linked to e 
moiety via the glycosidic carbon «!• «*« the 

pentose) and that combination of base and sugar « 
nucleoside . When the nucleoside contains a phosphat 
group bonded to the or 5- position of the pentose it 

referred to as a nucleotide. 

nase Pair (bp) = a pairing (by hydrogen 
adenine (A) with thymine ,T, , or of cytosine (C, wi 
35 guanine (G, in a double stranded DKA molecule. In m. 
uracil (U) is substituted for thymine. 
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Nucleic — Acid : a polymer of nucleotides, either 

single or double stranded. 

Gene: a nucleic acid whose nucleotide sequence codes 

for an RNA or polypeptide. A gene can be either RNA or 
DNA. 



Complementary Bases: nucleotides that normally pair 

up when DNA or RNA adopts a double stranded configuration. 

Complementary Nuc leotide Sequence : a sequence of 

nucleotides in a single-stranded molecule of DNA or RNA 
10 that is sufficiently complementary to that on another 
single strand to specifically hybridize to it with 
consequent hydrogen bonding. 

Conserved : a nucleotide sequence is conserved with 

respect to a preselected (reference) sequence if it non- 
15 randomly hybridizes to an exact complement of the 
preselected sequence. 

H ybridization : the pairing of substantially 

complementary nucleotide sequences (strands of nucleic 
acid) to form a duplex or heteroduplex by the establish- 
20 ment of hydrogen bonds between complementary base pairs. 
It is a specific, i^ non-random, interaction between two 
complementary polynucleotides that can be competitively 
inhibited. 



N ucleotide Analoq : a purine or pyrimidine nucleotide 

25 that differs structurally from A, T, G, C, or U, but is 
sufficiently similar to substitute for the normal nucleo- 
tide in a nucleic acid molecule. 

DNA Homoloq : is a nucleic acid having a preselected 

conserved nucleotide sequence and a sequence coding for a 
30 receptor capable of binding a preselected ligand. 

Receptor : a receptor is a molecule, such as a 

protein, glycoprotein and the like, that can specifically 
(non-randomly) bind to another molecule. 

A ntibody : The term antibody in its various grammati- 

35 cal forms is used herein to refer to immunoglobulin 
molecules and immunologically active portions of immuno- 
globulin molecules, i.e., molecules that contain an 
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antibody combining site or paratope. Exemplary antibody 
molecules are intact immunoglobulin molecules, substan- 
tially intact immunoglobulin molecules and portions of an 
immunoglobulin molecule, including those portions known in 
5 the art as Fab, Fab', F(ab*) 2 and F(v). 

Antibody Combining Site ; An antibody combining site 
is that structural portion of an antibody molecule com- 
prised of a heavy and light chain variable and hypervari- 
able regions that specifically binds (immunoreacts with) 
10 an antigen. The term immunoreact in its various forms 
means specific binding between an antigenic determinant- 
containing molecule and a molecule containing an antibody 
combining site such as a whole antibody molecule or a 
portion thereof. 

15 Monoclonal Antibody : The phrase monoclonal antibody 

in its various grammatical forms refers to a population of 
antibody molecules that contains only one species of anti- 
body combining site capable of immunoreacting with a 
particular antigen. A monoclonal antibody thus typically 
20 displays a single binding affinity for any antigen with 
which it immunoreacts. A monoclonal antibody may there- 
fore contain an antibody molecule having a plurality of 
antibody combining sites, each immunospecific for a 
different antigen, e.g. , a bispecific monoclonal antibody. 
25 Upstream : In the direction opposite to the direction 

of DNA transcription , and therefore going from 5 ' to 3 ' on 
the non-coding strand, or 3' to 5' on the mRNA. 

Downstream : Further along a DNA sequence in the 

direction of sequence transcription or read out , that is 
30 traveling in a 3'- to 5 ' -direction along the non-coding 
strand of the DNA or 5'- to 3' -direction along the RNA 
transcript . 

Cistron: Sequence of nucleotides in a DNA molecule 

coding for an amino acid residue sequence. 

35 Stop Codon : Any of three codons that do not code for 

an amino acid, but instead cause termination of protein 
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synthesis. They are UAG, UAA and UGA. Also referred to 
as a nonsense or termination codon. 

Leader Polypeptide : A short length of amino acid . 

sequence at the amino end of a protein, which carries or 
5 directs the protein through the inner membrane and so 
ensures its eventual secretion into the periplasmic space 
and perhaps beyond. The leader sequence peptide is 
commonly removed before the protein becomes active. 

Reading Frame ; Particular sequence of contiguous 
10 nucleotide triplets (codons) employed in translation. The 
reading frame depends on the location of the translation 
initiation codon. 

Inside Primer ; An inside primer is a polynucleotide 
that has a priming region located at the 3 ' terminus of 
15 the primer which typically consists of 15 to 30 nucleotide 
bases. The 3' terminal-priming portion is capable of 
acting as a primer to catalyze nucleic acid synthesis. 
The 5' -terminal priming portion comprises a non-priming 
portion. 

20 outside Primer ; An outside primer comprises a 3'- 

terminal priming portion and a portion that may define an 
endonuclease restriction site which is typically located 
in a 5' -terminal non-priming portion of the outside 
primer. 

25 Fusion Polynucleotide Amplification : refers to in 

vitro techniques of generating a multiple complementary 
copies of a nucleic acid template which comprises nucleo- 
tide sequences which have been randomly combined to give 
a combined nucleic sequence. These techniques typically 
30 employ complementary primers which hybridize to the 
template and are extended in a primer extension reaction. 
The polyumerase chain reaction (PCR) techniques described 
herein comprise a preferred method of nucleotide sequence 
amplifications. Generation and amplification of a 
35 combined nucleotide sequence using fusion PCR is further 
described herein. 
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vector ; As used herein, the term -vector” refers to 
a nucleic acid molecule capable to transporting between 
different genetic environments another nucleic acid to 
which it has been operatively linked. One type of pre- 
ferred vector is an episome, i.e., a nucleic acid molecule 
capable of extra-chromosomal replication. Other suitable 
vectors include plasmid and cosmid vectors and phage, 
especially bateriophage such as lambda. Preferred vectors 
are those capable of autonomous replication and/or expres- 
sion of nucleic acids to which they are linked. Vectors 
capable of directing the expression of genes to which they 
are operatively linked are referred to herein as 

"expression vectors" . 



B. Methods 

15 Until this invention, genetic engineers typical y 

dealt with the expression of a single gene or family (or 
population) of genes, one at a time. The expression of a 
family of genes in a vector is generally referred to as a 
"gene library." Each member of the library will normally 
20 contain a different gene or DNA sequence. However, the 
vector portion of such a vector-gene fusion is typically 
identical from member to member. (Maniatis et al . , 
supra) . individual members within the library may often 
be, and typically are, amplified before screening to 
25 identify and isolate a desired member. Amplification 
occurs so that each library member grows as a bacterial 
colony (for plasmid libraries) or phage plaques (for 
bacteriophage libraries, such as lambda). These amplified 
members are usually referred to as "clones," since each 
30 colony or plaque is made up of many identical host cells 

or phage particles. 

The search for a particular clone containing a single 
gene or DNA sequence of interest can be accomplished m 
many different ways. The clone may be identified because 
35 its vector-gene specifically hybridizes with a nucleic 
acid probe. It may also be identified by expression of an 
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RNA species that can be identified, for example by nucleic 
acid hybridization. The RNA species may, furthermore, be 
translated into a protein, typically by the host cell, 
that may be identified, for example, by reactivity with an 
5 antibody probe. Alternatively, the protein may be recog- 
nized because it binds a substrate, or catalyzes a 
reaction, or allows the host cell to survive under 

selective conditions, and so on. 

Described herein are libraries in which two or more 
10 families (or populations) of genes are expressed in a 

vector or a host cell in such a way that the gene combi- 

nations are randomly represented and subsequently detected 
on the basis of some property or characteristic in the 
event that a particular combination of one member from a 
15 first gene family and one gene from a one or more other 
gene families are combined in a vector host cell. For 
example, in the general case if there are "i" members of 
the gene family "A” and "j» members of the gene family 
"B", there will be (i) x (j) combinations of selected gene 
20 members A and B in the randomly created vector-gene 

population. if there are three gene families, A, B, and 
C, and a vector is made containing one member from each of 
the three gene families, the total number of combinations 
of genes will be the product of the number of A genes 
25 times the number of B genes times the number of C genes. 
Thus, methods are provided wherein at least two genes may 
randomly be combined, preferably on the same vector 
molecule, having been identified within a population of 
vectors containing other combinations of different genes 
30 from the same two or more gene families. This approach 
may be broadly accomplished by means other than recombi- 
nation, for example, the use of a vector having at least 
two independent insertion sites for two foreign genes or 
inserting in a vector a nucleotide sequence comprising 
35 nucleotide sequences from each gene family. The recombi- 
nation of at least two separate library populations to 
make a combinatorial population, for example, using a 
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common restriction site or site-directed recombination 
systems, is also contemplated. 

Thus, in addition to the above-described methods, the 
invention also provides for vectors having characteristics 
5 and sequences useful for the preparation of combinatorial 
vectors encoding random DNA sequences from two or more 
gene families. Such vectors include plasmids and phage 
containing common restriction sites or sequences enabling 
the in vivo recombination of said DNA sequences from said 
10 gene families. 

The flp site-specific recombination of S_j_ cervisiae 
has been described in Cox, Chapter 13 in "Genetic Recombi- 
nation," eds. R. Kucherlapati and G. Smith (American 
Society for Microbiology 1988) . Within a sixty-five bp 
15 region identified as the recombination site and designated 
FRT ( flp recombination target) , there are several promi- 
nent structural features. The most important are a set of 
three bp repeats. The second and third repeats are separ- 
ated by one bp and are in the same orientation. The first 
20 repeat is inverted with respect to the other two and is 
separated from the second repeat by an eight bp spacer. 
The first repeat also has a one bp mismatch relative to 
the first two. Deletion analysis has demonstrated that 
the third repeat is unnecessary for recombination in 
25 vitro . although it may have a slight effect on the reac- 
tion in vivo . Additional deletions indicate that most, 
but not all, of the first and second repeats (those 
flanking the spacer) are required. While deletion of 
three bp from the distal ends of one or both of these 
30 repeats has no detectable effect on the reaction, further 
deletion leads to a gradual reduction in site function, 
with complete loss of site function occurring (in vitro ) 
with deletions of eight bp or more from either end. The 
minimal site required for a full function in vitro is 
35 therefore relatively small (approximately 28 bp including 
the spacer and the proximal 10 bp of each flanking 
repeat). Accordingly, it will be seen that the full. 
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intermediate , or minimal FRT sequences can be utilized to 
accomplish flp-mediated site-specific recombination. 

The lambda phage attachment site is responsible for . 
integration of lambda into the host chromosome. It also 
5 acts as a hot spot of recombination and lytic crosses 
between wild lambda chromosomes. As in lambda, in PI 
phage a site— specific cross over site, loxP acts as a hot 
spot of recombination. This site is recognized by the PI 
ere protein, a known site-specific protein. The site- 
10 specific recombination system is responsible for the rare 
integration of PI into the host chromosome. The cre-lox 
system of bacteriaphage PI is also useful for the site- 
specific recombination contemplated by the invention 
described and claimed herein. 

15 A transposon can jump from one vector to another 

vector or from a vector to a bacterial chromosome. 
Different transposons having different inverted repeat 
sequences and carrying, for example, different drug- 
resistance genes, can be used to carry out the desired 
20 random combination of genes as described herein either in 
vivo or in vitro . The transposon may, but need not, also 
contain a sequence encoding the transposase enzyme which 
catalyzes the "hop." Various suitable transposon systems 
have been described in the literature. (See, Mobile DNA, 
25 Douglas E. Berg and Martha M. Howe, eds., American Society 
for Microbiology, Washington, D.C., 1989). One suitable 
transposon system is the gamma-delta transposon system 
which has been isolated from E. Coli. 

Thus, in addition to restriction digestion and 
30 ligation, use of flp type recombination systems, and 
homologous recombination, a transposon system can also be 
used to integrate a light (or heavy) antibody chain clone, 
into a heavy (or light) antibody chain clone. For 

example, this can be accomplished by flanking the light 
35 chain expression and cloning region with transposon 
terminal sequences. A library constructed in this light 
Chain vector could be used to co-infect bacteria with 
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clones from the heavy chain library. The light chain 
inserts between the terminal sequences would hop from the 
light chain lambda phage vector into other DNA sequences 
in the presence of transposase activity. Selection for 
5 hopping into the heavy chain clone can be accomplished by 
placing a selectable marker within the light chain, posi- 
tioned between the transposon hopping sequences. 
Subsequently, phage recovered from the co-infected culture 
is plated with a strain enabling selection for the heavy 
10 chain vector and for the light chain marker gene. Because 
this second plating is performed under conditions of a 
high cell to phage ratio, only one lambda phage will 
typically be introduced into each cell . The lambda phage 
should grow only if the phage contains genes from both the 
15 heavy and light chain clones; most efficiently resulting 
from the transposon hop. If the hop occurs in the essen- 
tial genes of the heavy chain clone, the phage will not 
grow. Only phage containing the transposon in the proper 
position within the heavy chain will grow. A collection 
20 of these clones comprises a library of combinatorial heavy 
and light chain antibody clones. 

According to one aspect of the present invention, 
fusion PCR is used to generate two PCR-amplif ied DNA 
fragments, each of which have one of their ends modified 
25 by directed mispriming so that those ends share regions of 
complementarity, i.e., cohesive termini. When the two 
fragments are mixed, denatured and reannealed in a PCR 
cycle, the cohesive termini on two strands hybridize to 
form an "overlapping" DNA duplex that is internally 
30 primed. The subsequent PCR cycle primer-extends the non- 
overlapping regions to form a hybride DNA molecule that is 
dicistronic. See Figure 21. 

PCR amplification methods are described in detail in 
U.S. Patent Nos. 4,863,192, 4,683,202, 4,800,159, and 

35 4,965,188, and at least in several texts including "PCR 

Technology; Principles and Applications for DNA 

Amplification", H. Erlich, ed. , Stockton Press, New York 
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(1989) ; and "PCR Protocols" A Guide to Methods and 
Applications", Innis et al. , eds. , Academic Press, San 
Diego, California (1990). 

Thus, in one aspect of the present invention, fusion 
5 PCR is used to produce a library of dicistronic DNA mole- 
cules ocntaining upstream and downstream cistrons wherein 
first and second PCR amplification products are produced 
using respective first and second PCR primer pairs. The 
first PCR primer pair comprises a first polypeptide 
10 outside primer and a first polypeptide inside primer. 
Similarly, the second PCR primer pair comprises a second 
polypeptide outside primer and a second polypeptide inside 
primer. The first and second polypeptide inside primers 
contain complementary 5 1 -terminal sequences that allow 
15 their DNA complements to hybridize and form an internally- 
primed duplex having 3 • -overhanging termini. The 
internally-primed duplex is then subjected to primer 
extension reaction conditions to produce a double 
stranded, dicistronic DNA having substantially blunt or 
20 blunt ends. The dicistronic DNA is then PCR amplified 
using the outside primers as a PCR primer pair. 

The dicistronic DNA molecule comprises two amino acid 
residue-coding sequences on the same strand separated by 
at least one stop codon and at least one signal sequence 
25 necessary for translation of the downstream cistron, such 
as a translation initiation codon, ribosome binding site, 
and the like. Thus, the upstream and downstream cistrons 
of the dicistronic DNA molecule are operatively linked by 
a cistronic bridge. The cistronic bridge comprises the 
30 genetic elements necessary to terminate translation of the 
upstream cistron and initiate translation of the down- 
stream cistron. For instance, the coding strand of the 
bridge codes for one or more stop codons, preferably two, 
in the same translational reading frame as the upstream 
35 cistron. The cistronic bridge coding strand preferably 
also encodes a ribosome binding site for the dowstream 
cistron located downstream from the upstream cistron' s 
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stop codon(s). Typically, the coding strand of the 
cistronic bridge will also encode a leader polypeptide 
segment in the same translational reading frame as the 
downstream cistron. When present, the nucleotide base 
5 sequence encoding the leader usually begins with an 

initiation codon located within an operative distance, 
i.e. is operatively linked, to the ribosome binding site. 

The following discussion illustrates the use of 

fusion PCR to isolate a pair of V H and V L genes from the 
10 immunoglobulin gene repertoire. This discussion is not to 
be taken as limiting, but rather as illustrating an appli- 
cation of creating a novel phenotype by combining one 

member from each of two or more families of genes. The 
illustrated method can be used with other families of 
15 conserved genes which each for one unit of a dimeric 

receptor, whether obtained directly from a natural source, 
such naive or in vivo immunized cells, or from cells or 
one or more genes that have been treated or mutagenized in 
vitro. Generally, the method, combines the following 
2 0 elements : 

1. Producing V H and V L gene repertoires. 

2. Preparing sets of outside and inside polynucleo- 
tide primers for cloning polynucleotide segments 
containing immunoglobulin V H and V L region genes. 

25 3. Preparing a library containing a plurality of 

different dicistronic DNA molecules, each containing a V„ 
and a V L gene from the respective repertoires. 

4. Expressing the dicistronic DNA molecules in 
suitable host cells. 

30 5. Screening the polypeptides expressed by the 

dicistronic DNA molecules for the preselected activity, 
and segregating a dicistronic DNA molecules for the 
preselected activity, and segregating a dicistronic DNA 
molecule identified by the screening process. 

35 The present invention also provides a novel 

method for screening variants of a parental clone or 
clones. If the parental clone or clones contain two 
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nucleotide sequences that, when expressed together, create 
a phenotype, then such nucleotide sequences can be altered 
to create populations of variants of such nucleotide 
sequences. If the two variant populations are coexpressed 
5 in a random fashion (that is with no correlation between 
the specific alterations made in the two different nucleo- 
tide sequences) , then a combinatorial collection of such 
nucleotide sequence variants has been created. Such com- 
binatorial collections may be screened for the presence 
10 of phenotypes that are unlike the parental clone or 
clones. Generally, the method combines the following 
elements: 

1. Replicating a clone containing a nucleotide 
sequence under conditions that allow mutations to occur. 

15 2. Replicating a second clone containing a second 

nucleotide sequence under conditions that allow mutations 
to occur. 

3 . Randomly combining and co-expressing the two 
mutated populations of nucleotide sequences. 

20 4. Screening clones containing combinations of 

mutated nucleotide sequences for phenotypes that were not 
present in either parent clone. 

Alternatively, the methods combine the following 
elements : 

25 1. Replicating at least portions of two nucleotide 

sequences contained within a single clone under conditions 
that allow mutations to occur in either nucleotide 
sequence. 

2 . Allowing recombination events between the two 
30 nucleotide sequence populations to reassociate mutant 

nucleotide sequences to form new pairs of the two 
sequences that were not paired in the original mutated, 
replicated population. 

3 . Screening clones containing combinations of 
35 nucleotide sequences for phenotypes that were not present 

in the parent clone or in the mutant replicas of the 
parent clone. 
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For example, assume a parent clone containing two 
nucleotide sequences A and B is replicated under mutating 
conditions such that variant clones are formed: 

Parent : A/B 

5 Variant 1: Al/B 

Variant 2 : A/Bl 

Variant 3 : A2/B1 

Variant 4 : A/B2 

Variant 5 : A3/B 

10 However, within this mutated population, the combinations 
A1/B2 , A2/B, A2/B2 , A3/B1, and A3/B2, do not occur. If 
the mutant population (including some non-mutated parent 
clones) is allowed to recombine sequences A and B and 
their variants, then combinations such as A1/B2 , A2/B etc. 
15 can be created. Such new combinations may express a 
desired phenotype that was not present in the parental or 
the variant population. 

In one aspect, the present invention is related to 
methods for tapping the immunological repertoire by 
20 isolating from V H -coding and V L -coding gene repertoires 
genes coding for a heterodimeric antibody receptor capable 
of binding a preselected ligand. Generally, the method 
combines the following elements: 

1. Isolating nucleic acids containing a substantial 
25 portion of the immunological repertoire. 

2. Preparing polynucleotide primers for cloning 
polynucleotide segments containing immunoglobulin V H and V L 
region genes. 

3. Preparing a gene library containing a plurality 
30 of different V H and V L genes from the repertoire. 

4. Expressing the V H and V L polypeptides in a 
suitable host, including prokaryotic and eukaryotic hosts, 
on the same expression vector. 

5. Screening the expressed polypeptides for the 
35 preselected activity, and segregating a V H - and V L -coding 

gene combination identified by the screening process. 
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In one aspect, the expressed phenotype produced by 
the methods by the present invention comprises a multi- 
meric polypeptide product (i.e. a heterodimer, etc.) which 
assumes a conformation having a binding site specific for, 

5 as evidenced by its ability to be competitively inhibited, 
a preselected or predetermined ligand such as an antigen, 
enzymatic substrate and the like. In one embodiment, the 
multimeric polypeptide is an antibody that forms an anti- 
gen binding site which specifically binds to a preselected 
10 antigen to form an immunoreaction product (complex) having 
a sufficiently strong binding between the antigen and the 
binding site for the immunoreaction product to be iso- 
lated. The antibody typically has an affinity or avidity 
is generally greater than 10 5 -M"*. 

15 In another embodiment, a multimeric polypeptide 

produced according to the present invention is capable of 
binding a substrate and catalyzes the formation of a 
product from the substrate. While the topology of the 
ligand binding site of a catalying multimeric polypeptide 
20 is probably more important for its preselected activity 
than its affinity (association constant or pKa) for the 
substrate, the useful catalytic multimeric polypeptides 
typically have an association constant for the preselected 
substrate generally greater than 10 3 M*\ more usually 
25 greater than 10 5 M' 1 or 10 6 M* 1 and preferably greater than 
10 7 M* 1 . 

Preferably the multimeric polypeptide produced 
according to the present invention is heterodimeric and is 
therefore normally comprised of two different polypeptide 
30 chains, which together assume a conformation having a 
binding affinity, or association constant for the pre- 
selected ligand that is different, preferably higher, than 
the affinity or association constant of either of the 
polypeptides alone, i.e., as monomers. In a particularly 
35 preferred aspect, one or both of the different polypeptide 
chains is derived from the variable region of the light 
and heavy chains of an immunoglobulin. Typically, poly- 
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peptides comprising the light (V L ) and heavy (V H ) variable 
regions are employed together for binding the preselected 
ligand. 

A V H or V L produced by the methods of the subject 
5 invention can be active in monomeric as well as multimeric 
forms, either homomeric or heteromeric, preferably hetero- 
dimer ic. A V H and V L ligand binding polypeptide produced 
by the present invention can be advantageously combined in 
a heterodimer (antibody molecule) to modulate the activity 
10 of either or to produce an activity unique to the hetero- 
dimer. The individual ligand binding polypeptides will be 
referred to as V„ and V L and the heterodimer will be 
referred to as an antibody molecule. 

However, it should be understood that a V H binding 
15 polypeptide may contain in addition to the V H , substan- 
tially all or a portion of the heavy chain constant 
region. A V L binding polypeptide may contain, in addition 
to the V L , substantially all or a portion of the light 
chain constant region. A heterodimer comprised of a V H 
20 binding polypeptide containing a portion of the heavy 
chain constant region and a V t binding containing substan- 
tially all of the light chain constant region is termed a 
Fab fragment. The production of a Fab can be advantageous 
in some situations because the additional constant region 
25 sequences contained in a Fab as compared to a F v could 
stabilize the V„ and V L interaction. Such stabilization 
could cause the Fab to have higher affinity for antigen. 
In addition the Fab is more commonly used in the art and 
thus there are more commercial antibodies available to 
30 specifically recognize a Fab. 

The individual V„ and V L polypeptides may be produced 
in lengths equal or substantially equal to their naturally, 
occurring lengths. However, the individual V H and V L poly- 
peptides will generally have fewer than 125 amino acid 
35 residues, more usually fewer than about 120 amino acid 
residues, while normally having greater than 60 amino acid 
residues, usually greater than about 95 amino acid 
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residues, more usually greater than about 100 amino acid 
residues. Preferably, the V H will be from about 110 to 
about 125 amino acid residues in length while V L will from 
about 95 to about 115 amino acid residues in length. 

5 The amino acid residue sequences of the polypeptides 

will vary widely, depending upon the particular idiotype 
involved. Usually, there will be at least two cysteines 
separated by from about 60 to 75 amino acid residues and 
joined by a disulfide bond. The polypeptides produced by 
10 the subject invention will normally be substantial copies 
of idiotypes of the variable regions of the heavy and/or 
light chains of immunoglobulins, but in some situations a 
polypeptide may contain random mutations in amino acid 
residue sequences in order to advantageously improve the 
15 desired activity. 

In some situations, it is desireable to provide for 
covalent cross linking of the V H and V L polypeptides, which 
can be accomplished by providing cysteine resides at the 
carboxyl termini. The polypeptide will normally be pre- 
20 pared free of the immunoglobulin constant regions, however 
a small portion of the J region may be included as a 
result of the advantageous selection of DNA synthesis 
primers. The D region will normally be included in the 
transcript of the V H . 

25 In other situations, it is desirable to provide a 

peptide linker to connect the V L and the V H to form a 
single-chain antigen-binding protein comprised of a V H and 
a V L . This single-chain antigen-binding protein would be 
synthesized as a single protein chain. Such a single- 
30 chain antigen binding proteins have been described by Bird 
et al.. Science . 242:423-426 (1988). The design of 
suitable peptide linker regions is described in U.S. 
Patent No. 4,704,692 by Robert Landner. 

Such a peptide linker may be designed as part of the 
35 nucleic acid sequences contained in the expression vector. 
The nucleic acid sequences coding for the peptide linker 
would be between the V H and V L DNA homologs and the 
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